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Abstract 



This study compared the difficulty of test items administered 
by paper-and-pencil with the difficulty of the same items 
administered by computer and determined if a mode by ability 
Interaction exists. A significant main effect for mode of 
administration was found. No significant mode by ability 
Interaction was found. 
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The Effects of Mode of Test Administration on Test Performance 

Computerized Adaptive Testing (CAT) requires a given examinee 
to interact individually with a computer. Test items are presented 
singly based upon the examinee's responses to previous items. The 
computer program re-estimates the examinee's ability level after he 
or she responds to an item then selects the next item which is most 
appropriate to that examinee's re-estimated ability level. The 
computer administers and scores the test and records the score. CAT 
has been made possible from the relatively recent advances in 
computer technology and theoretical developments in Item response 
theory (IRT; Hambleton & Cook, 1977; Lord, 1977; Urry, 1977). 
Psychometric interest in adaptive testing generates from the 
improved measurement that such- testing strategies provide, compared 
to conventional testing strategies (McBridCp 1979; Urry, 1977). 
Moreover, practical reasons (Space, 1981), e.g., cost-effectiveness 
(Elwood, 1972) and more efficient use of labor (see, Gedye & Miller, 
1969), have been impetuses for computerizing psychological tests, 
regardless of whether an adaptive strategy is employed or not. 
Although the theoretical basis is extant and the technology is 
available, there are still implementation questions which must be 
answered (see Johnson, Godla, f Bloomquifit, 1981; Johnson & 
Johnson, 1981). The effects, if any, of computerized testing 
procedures on examinees' performance are not clear. There has not 
been a great deal of research in this area. T^ie studies which have 
been conducted have provided mixed results. Studies investigating 
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the reliability (Katz & Dalby, 1981; Lushene, O'Neil, & Dunn, 197A) 
and validity (Lushene et al., 197A) of computerized versions of 
personality tests have obtained coefficients comparable to the 
paper-and-pencil forms of the tests. Research involving the 
use of computer devices to administer cognitive tests have 
provided less consistent findings. Research with the Raven 
Progressive Matrices Test (Rock & Nolen, 1982; Hitti, Rlffer, & 
Stuckles, 1971) indicates that a computerized form of the test is 
a viable alternativvd to the paper-and-pencil form. Other 
research (see Hansen & O'Neil, 1970; Hedl, O'Neil, & Hansen, 
1973; Johnson & White, 1980; Johnson & Johnson, 1981), however, 
suggests that interacting v/ith a computer to complete an intelligence 
test may evoke a significant amount of am:iety to affect 
performance. 

A pattern of differences between the two modes of test 
administration according to the specific aptitudes tested has not been 
found. Srme examinees have performed better on verbal tests when they 
were administered by computer rather than by paper-and-pencil (Serwer & 
Stolurow, 1970; Johnson & Mihal, 1973) while other examinees have 
performed poorer on verbal tests administer ^d by computer (Johnson & 
Mihal, 1973; Wildgrube, 1982) rather than papcr-and-pencil . Still 
other examinees have Rhown no difference In performance between the 
two modes on verbal tests (Sachar & Fletcher, 1977) or tests which 
require memory retrieval (English, Reckase, & Patience,* 1977; Hoffman 
& Lundberg,^ 1976). Similarly, no pattern has been found for quantitatl 
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ability. Johnson and Mlhal's (1973) subjects performed better better 
on quantitative tests when the tests were ccmputer administered. In 
contrast, Wildgrube (1962) found no significant differences In 
performance between the modes for arithmetic reasoning. Studies 
involving other nonverbal tests, e.g., figural reasoning (Wildgrube, 
1982) and analytical processing (Sachar & Fletcher, 1977) have also 
produced mixed results. . . , 

In summary, the effects of mode cf test presentation on performance 
are not clear. Conflicting findings in previous research might be 
due to differences in methodology. Finding differences between modes 
might depend upon test content (e.g., personality tests vs. cognitive 
iests or easy tests vs. difficult tests or verbal test vs. quantitative 
tests), the population tested (e.g., blacks vs. whites or naive 
subjects vs. experienced subjects), or the design of the study (e.g., 
repeated measures vs. independent groups or sample si: e) . 

The purpose of this study was (1) to compare the mean difficultj^ 
of test items which were administered by paper-and-pencil with the 
mean difficulty of the same items administered by computer and (2) to 
determine if an interaction between mode of test administration and 
ability exists. / 

Methods 

• > 

Subj ects 

Subjects were 65A male Marine Corps recruits between th^ 'ges of 
18 and 25, stationed at the Marine Corps Recruit Depot (MCRD) , San 
Diego, California. The paper-and-pencil test was administered to 33A 
recruits and the computerized test was administered to 320 recruits. 
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Procedure 

A 30-itein arithmetic reasoning test was constructed for this 
study. The number of items that an examinee answered correctly on 
the experimental arithmetic reasoning test (EXP-AR) was the dependent 
variable. In addition, all subjects had taken the Armed Services 
Vocational Aptitude Battery (ASVAB) approximately two weeks to six 
months prior to the experimental test. A given subject number- 
correct score for the Arithmetic Reasoning subtest of the ASVAB 
(ASVAB-AR) was used as an independent estimate of that subject's 
arithmetic reasoning ihility. 

EXP-AR was aSminlstered to participants approximately 2A hours 
after their arrival at the MCRD receiving barracks. Each subject was 
randomly assigned to one of the two modes of test administration. , 

Subjects in the paper-and-pencil mode were tested in groups of A 
to 10. Each subject vas given a test booklet containing test instructions 
three 'sample questions, and the 30 test items. There were approximately 
eight items per page. Item responses were recorded on an answer 
sheet. It was possible for examinees to refer to previous items and 
to change their answers. 

Subjects in the computer mode were tested in groups of four, 
using cathode- ray tube terminals. Test instructions were presented 
by the computer. The instructions were written to be as similar as 
possible to those given in the paper-and-pencil mode, except additional 
instructions on the use of the computer terminal were given. The 
same three sample questions that were given in the paper-and-pencil 
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mode were administered by the computer. Each sample and test item 
was displayed individually on the screen. Keys used to enter item 
responses were specially labelled with bold, black letters on a white 
background. It was not possible for examinees in the computer mode 
to refer to previous items nor to change their answers once the 
answer had been entered on the keyboard and recorded by the computer. 

Time limits werti not imposed and omitting of items was not 
allowed in either mode of administration. 

Results 

Sixty-nine subjects were deleted from the original aampla because 
of incomplete data. The final sample size was 585, with 300 in the 
paper~and-pencil mode and 285 in the computer mode. 

Linear regression analysis was used to perform an anaJLyeis of 
covariance, with EXP-AR as the dependent variable, mode of test 
administration as the independent variable, and ASVAB-AR as the 
covariate. , 

A significant main effect for mode of administration was found 
(p<.01). 

As shown in Table 1, the mean ASVAB-AR number-correct scores for 
the two groups were very close in value. This indicates thatv on the 
basis of arithmetic reasoning ability, random assignment to groups 
was successful. Mean number-correct scores for the experimental test 
given under the two nwdes of administration were significantly 
different from each other. Regression analysis was used to further 
investigate this difference. 
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The following regression model was used to test for a significant 
interaction between mode of administration and ability: 

E(Y) - Bq -f P^X ^ + BjCMX) , 
where Y was the EXP-AR score, X was the pre-enlistment ASVAB-AR score 
(the covariate), M was +1 if the examinee was in the paper -and -pencil 
group and -1 if the examinee was in the computer group, and MX was the 
product of M and X (the interaction tero) . Jhn B symbolizes raw-score 
regression weights • 

Results showed Bo was not significantly different than zero, 
indicating that there was no significant interaction between ability 
and mode-of -a .ministration • The effect of ability (as measured by 
ASVAB-AR) on the dependent measure (EXP-AR) was the same regardless 
of mode of test administration. Therefore, the following model was 
the appropriate one to fit: 

E(Y) Bq + Bj^X + B2^l. 

The maJ.tiple regression coefficient for this model was -75, with 

B1-.8IO2, F-747.702, and Bo-. 5133, F-10.793, p<.01. Since B2 was 

% 

significantly different than zero, this indicates the presence of a main 
effect for mode of test administration. 

Information from the regression analysis was used to obtain the two 
within group regressions of EXP-AR on ASVAB-AR. Figure 1 shows a 
plot of these regression lines, superimposed upon a scatterplot in 
which ASVAB-AR number-correct score is on the horizontal axis and 
EXP-AR is on the Vertical axis. The difference between the intercepts 
for these two parallel lines was 1.0277. The means for the paper- 
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and-pencil group and the ccrputer group, "adjusted for" the ASVAB-AR 
covariate, were 19.31 and 18.28, respectively. "Therefore, subjects 
in the paper-and-pencil mode of test administration scored, .on the 
average, 1.03 raw-score points above the subjects in the computer 
mode. 

Item analysis was performed to determine if the effect of mode 
of test administration was the same over all test items or if some 
iter-3 were affected more than others. Figure 2 shows a scatterplot 
of the p-values for the two gro'ips. Twenty-one bf the 30 items were 
more difficult in the computer' mode, while only three were more 
difficult in the paper- and-peacil mode. The rtemaining six Items were 
of approximately equivalent difficulty. This result shows that item 
difficulty was affected by mode of administration and that this, 
effect was fairly constant across items. ^ 

Implications and Conclusions 

The obtained main effect by mode was unexpected. It is not 
obvious what caused the computerize test to be more difficult. Jhe 
anxiety level may have been significantly higher in the computer 
raodev which adversely affected performance (see Hansen & O'Neil, 
1970; Hedl, O'Neil, ^ Hansen, 1973; Johnson & Johnson, 19C1) . More 
•'training in the use of computers to alleviate, possible computer- 
evoked anxiety is suggested in future research and applications. 

On the other hand, past research has failed to consistently find 
significant differences between the two modes of presentation, 
without specifically controlling for anxiety. Moreover, the pattern 
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of differences between the modes across different abilities has not 
been consistently replicated across studies. Alternatively, the 
. number of itefas present at a given tine (e.g., eight in the paper- 
and-pencil mode vs, one in the comptiterized mode) may significantly 
affect' performance on certain types of items (see Hoffman & Lundberg, 
1976) • Vtie results from the current study indicate that more research 
is needed to corroborates^ the e:cistence of significant differences 
between the inodes. Further research is especially needed to identify 
the specific factors affecting „test performance in the two modes. 
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Table 1 



Descriptive Statistics for Experimental AR an# 
ASVAB^VR Broken Oown by Experimental Group 



N of Std. T 2-Tail 

Variable Cases Mean Oev. Value Prob. 



Experimental AR 

Papcr-and-PeoQij Group 300 1931 5.62 

Computer Group) 285 18.27 5.81 

ASVAB AR 

Papcr-aod-Pencil Group 300 20.66 527 

Computer Group 285 20.65 531 



2.19 .03 



.02 .93 
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•.-.•.nr.- ' t'orfor-^anco (nunber-correct scores') on the F.xperinnnral-Arithmt^tic 
..,V".o.r-'i'- -^."-.t (Kxporimoncal ,\R) as a function of perfortTiancn nunbp r-c.orr..ar. 
-u-on>s) on --ho Arn.-d ;;orvlco. '.'oo.nt Innal Aptitude Ratt," . y-Arithnotlc Roar.onlnp 
.-.uht.-r rVSVAF^ AR). The solid lino ( — ) the regression in. ^or tho 
•M.-M-and-noncil ^ode • of test administratdon ; the dashed U ne .- Is the 
-•,>^.r...»slon line for the conput-r node of test adninlstration . The circles (o) 
represent data points for the nape r-and-penc i 1 n,ode ; the crosses ( :<) represent 
data nnlnts for the computer node. 
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Fi^^ure 2. A scatterplot of the item difficulty Indices; ( p-valuo^-. ') 
paper-and-poncil mode of test adr.inij^tiration and rhe computer node 
adin inis t rat ion . 



19 



0.6 

k Pencil Mode 



